Conversation
guangy10
left a comment
There was a problem hiding this comment.
@chmjkb Can you add a test for Whisper model under optimum-executorch/tests/? The test should be straightforward, we only test two things: 1) export to ET, 2) e2e task using the PTE via HF API.
Once we have the test, @michaelbenayoun can help kick off the CI
|
Hi @guangy10, I rebased and added the test, lmk if that looks good 👍🏻 |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
| assert encoder_input_ids.shape == torch.Size( | ||
| [1, 80, 3000] | ||
| ), f"Whisper only accepts a log-mel spectrogram of shape [1, 80, 3000], passed shape: {encoder_input_ids.shape}" |
There was a problem hiding this comment.
@chmjkb This is the config for whisper-tiny only, right? If I switch to a different variant of whisper, it won't work, e.g. whisper-large-v3. I think we can load this dynamically from preprocessor_config.json?
IIUC, each dim in encoder_input_ids represents [batch_size, feature_size, nb_max_frames]?
There was a problem hiding this comment.
Yeah, I think whisper large is an exception and it will take in 128 features instead of 80. Will fix that. (The smaller ones should work with 80)
There was a problem hiding this comment.
Actually, I'm thinking about getting rid of this assertion. Instead, I could just resolve correct shape when instantiating example_encoder_input_ids (im doing this anyways). If a user passes wrong shape for some reason, then be it.
Also, WhisperEncoder itself raises ValueError when the length of the features is not correct.
WDYT?
| if isinstance(self.full_model, WhisperForConditionalGeneration): | ||
| dynamic_shapes = None | ||
| else: | ||
| # Define dynamic dimension for encoder output sequence length | ||
| encoder_seq_len_dim = torch.export.Dim("encoder_hidden_seq_length", max=self.max_hidden_seq_length) | ||
| dynamic_shapes = { | ||
| "decoder_input_ids": None, | ||
| "encoder_hidden_states": {1: encoder_seq_len_dim}, | ||
| "cache_position": None, | ||
| } |
There was a problem hiding this comment.
@tugsbayasgalan @pianpwk Can we use Dim.AUTO here, to avoid setting dynamic_shapes explicitly for different models? In this case, Whisper would expect all static shapes and T5 would want to set encoder_hidden_seq_length to by dynamic at least.
There was a problem hiding this comment.
Yep Dim.AuTO would be perfect here. Doing so, you don't need the if/else branching.
There was a problem hiding this comment.
I tried changing the code to the following:
dynamic_shapes = {
"decoder_input_ids": None,
"encoder_hidden_states": {1: torch.export.Dim.AUTO},
"cache_position": None
}
Unfortunately when I do that, the export fails with the following error:
RuntimeError: Cannot evaluate the shape upper bound of a dynamic-shaped tensor to a concrete bounded integer. Got tensor spec: TensorSpec(dtype=torch.float32, shape=[1, s0, 384], layout=torch.strided, is_sparse=False, shape_dynamism=1, const=False, requires_grad=True).The upper bound shape we get [1, int_oo, 384], the upper bound stride we get [int_oo, 384, 1]This tensor could either be from 1. a data-dependent operation such as nonzero. Or 2. an input, whose don't have a constraint for the upper bound.Please use export's constrain_as_size() or constrain_as_value() apis and set a concrete upper bound to resolve this.
However doing:
dynamic_shapes = {"input_ids": {1: torch.export.Dim.AUTO}}
for the Whisper encoder seems to work, however it makes the T5 export fail :D
guangy10
left a comment
There was a problem hiding this comment.
Put it back to your queue. Feel free to ping me back for review on Discord or tag me here.
|
Re-run tests on CI |
|
Looks great! Can you fix the linter by running |
Co-authored-by: Guang Yang <42389959+guangy10@users.noreply.github.com>
Hi!
First of all, thanks for taking the time to build this project. I think it is a great step forward towards a greater DX with ExecuTorch!
As per discussions on ET Discord on the React Native ExecuTorch channel, this is a PR that makes it possible to export Whisper using optimum-executorch.